-
-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(coloctapp): dynamic backscraper #1011
feat(coloctapp): dynamic backscraper #1011
Conversation
Helps solve freelawproject#979 Since old opinions for coloctapp are inside PDFs, this script scrapes new https://research.coloradojudicial.gov/ search interface, which has a vlex backend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the command to fill the gaps is incorrect:
docker exec -it cl-django python /opt/courtlistener/manage.py cl_back_scrape_opinions --courts juriscraper.opinions.united_states.state.coloctapp --backscrape-start=09/28/2021 --backscrape-end=02/01/2022
the correct should be:
docker exec -it cl-django python /opt/courtlistener/manage.py cl_back_scrape_opinions --courts juriscraper.opinions.united_states_backscrapers.state.coloctapp --backscrape --backscrape-start=09/28/2021 --backscrape-end=02/01/2022
I tried to run it several times but it always gives me the same error:
Traceback (most recent call last):
File "/home/quevon24/PycharmProjects/juriscraper/sample_caller.py", line 253, in main
site.parse()
File "/home/quevon24/PycharmProjects/juriscraper/juriscraper/AbstractSite.py", line 145, in parse
self.__setattr__(attr, getattr(self, f"_get_{attr}")())
File "/home/quevon24/PycharmProjects/juriscraper/juriscraper/OpinionSiteLinear.py", line 29, in _get_case_dates
return [convert_date_string(case["date"]) for case in self.cases]
File "/home/quevon24/PycharmProjects/juriscraper/juriscraper/OpinionSiteLinear.py", line 29, in <listcomp>
return [convert_date_string(case["date"]) for case in self.cases]
KeyError: 'date'
let me know if you want me to try anything special 👍
2aa6737
to
3c63257
Compare
Colorado Courts have changed their site, the old site is no longer available and the scrapers won't work Helps solve freelawproject#1062 and freelawproject#979
3c63257
to
ea8924c
Compare
Old Colorado Courts sources no longer work, so this PR will solve both the gap issue and the new scraper needed. Please check this again @quevon24 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all good 👍
Helps solve #979
Since old opinions for coloctapp are inside PDFs, this script scrapes new https://research.coloradojudicial.gov/ search interface, which has a vlex backend